Mutual benefits: Combining reinforcement learning with sequential sampling models
نویسندگان
چکیده
منابع مشابه
Combining Stochastic Task Models with Reinforcement Learning for Dynamic Scheduling
We view dynamic scheduling as a sequential decision problem. Firstly, we introduce a generalized planning operator, the stochastic task model (STM), which predicts the effects of executing a particular task on state, time and reward using a general procedural format (pure stochastic function). Secondly, we show that effective planning under uncertainty can be obtained by combining adaptive hori...
متن کاملSequential Sampling Plan with Fuzzy Parameters
In this paper a new sequential sampling plan is introduced in which the acceptable quality level (AQL) and the lot tolerance percent defective (LTPD) are a fuzzy number. This plan is well defined, since, if the parameters are crisp, it changes to a classical plan. For such a plan, a particular table of rejection and acceptance is calculated and compared with the classical one. Keywords : St...
متن کاملCombining Reinforcement Learning with Symbolic Planning
One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly ...
متن کاملImportance Sampling for Reinforcement Learning with Multiple
This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....
متن کاملReinforcement Learning in Parameterized Models Reinforcement Learning with Polynomial Learning Rate in Parameterized Models
We consider reinforcement learning in a parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criterion. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistakes. The algorithm relies on Wald’s s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neuropsychologia
سال: 2020
ISSN: 0028-3932
DOI: 10.1016/j.neuropsychologia.2019.107261